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Jet shapes have the potential to play a role in many LHC analyses, for example in quark-gluon 
discrimination or jet substructure analyses for hadronic decays of boosted heavy objects. Most 
shapes, however, are significantly affected by pileup. We introduce a general method to correct 
for pileup effects in shapes, which acts event-by-event and jet-by-jet, and accounts also for hadron 
masses, ft involves a numerical determination, for each jet, of a given shape's susceptibility to 
pileup. Together with existing techniques for determining the level of pileup, this then enables an 
extrapolation to zero pileup. The method can be used for a wide range of jet shapes and we show 
its successful application in the context of quark/gluon discrimination and top-tagging. 



When energetic quarks or gluons (partons) fragment, 
they produce collimated bunches of hadrons known as 
jets. Jets mostly conserve the energy and direction of 
the originating parton, consequently they have long been 
used at colliders as a stand-in for generic partons, as 
is the case currently at CERN's Large Hadron Collider 
(LHC). In recent years extensive interest has developed 
in going beyond this basic use: for example, to under- 
stand if the parton is a quark or a gluon, or to iden- 
tify rare cases where a single jet originated from multi- 
ple hard partons, perhaps from the hadronic decay of a 
highly-boosted W,Z or Higgs boson, top quark or other 
massive object The "jet substructure" techniques 

being developed in this context will be crucial to exploit 
the full kinematic reach of the LHC, notably the high 
transverse- momentum (high-p t ) region, and to maximise 
the LHC's sensitivity to hadronic manifestations of new 
physics scenarios. 

Two key classes of approach are available to probe sub- 
structure: one identifies smaller "subjets" within a larger 
jet and then perform selections based on the kinemat- 
ics of those subjets, for example (3-[l0j: the other in- 
volves jet-shape observables, sensitive to t he geom etrical 
spread of the energy within the jet, e.g. [Ill - [l8l ]. Both 
classes a ppe ar to be powerful and viable experimentally 
(see e.g. |19l.[20|) and ultimate performance in exploiting 
jet substructure will probably be obtained through some 
combination of them. 

One potential show-stopper in substructure studies is 
the problem of pileup: with the LHC now operating at 
high instantaneous luminosities, each interesting, high-p t 
proton-proton collision is accompanied by dozens of ad- 
ditional pp collisions, which add substantial low-p t noise 
to the event. Pileup modifies a jet's kinematics, on av- 
erage shifting its pt in proportion to the level of noise 
in the event and to the jet's extent, or "area" [2l| . in 
rapidity (y = ~ In ) and azimuth ((f)). Two tech- 
niques are in common use to correct for this: the removal 
of an "offset" from the jet in proportion to the number 



of observed pileup events [22|; and the "area-median" 
method, which subtracts an amount given by the prod- 
uct of the event's measured pileup pt density (p) and the 
jet's measured area (A) [23l - [25l j . 1 While the second of 
these methods can be straightforwardly applied also to 
subjets, jet shapes have so far proved more challenging 
to correct. 

Jet shapes are particularly sensitive to pileup because 
its diffuse soft energy flow is characteristically different 
from the more collimated distribution of energy due to 
normal jet fragmentation. One can attempt to miti- 
gate pileup's impact by determining the shape using just 
charged tracks, or by breaking a jet into subjets and us- 
ing just the hardest subjets; but both methods throw 
away a significant fraction of the original particles con- 
tributing to the jet's shape, introducing a bias. One can 
also carry out analytical calculations of a given shape's 
sensitivity, as in Refs. [13, HI], or add in particles from 
a "complementary" cone at 90 degrees to the jet's axis 
in order to determine an average sensitivity [28| . These 
methods have so far, however, been limited either to spe- 
cific observables, restricted classes of jets (e.g. circular 
jets), or low pileup. The intent of this letter is to develop 
an effective, simple, general method to correct jet shapes 
for pileup. 

Our approach is related to the area-median method, 
which has been found to be beneficial in both ATLAS [24| 
and CMS [H] (see also [29j). It is intended to be valid 
for arbitrary jet algorithms and generic infrared and 
collinear safe jet shapes, 2 without the need for dedicated 



1 With particle flow 12a] ■ one can also directly discard the charged 
component of pileup 12511 . The remaining neutral part must be 
subtracted in some other way (it may not be the expected frac- 
tion of the charged component, due to detector effects). 

2 For the correction of collinear unsafe quantities, e.g. fragmenta- 
tion function moments, as used for quark/gluon discrimination 
in [U, see [H. 
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analytic study of each individual shape variable. It also 
involves an extension of the original area-median pre- 
scription to account for hadron masses. 

The first ingredient is a characterisation of the average 
pileup density in a given event in terms of two variables, 
p and p m , such that the 4- vector of the expected pileup 
deposition in a small region of size Sy6<f> can be written 

[ p cos 4>, p sin <p, (p + p m )smhy, (p + p m ) cosh y ] 5y5(f> , 

w 

where p and p m have only weak dependence on y (and 
4>) . Relative to the original area-median proposal [23| , a 
novelty here is the inclusion of a term p m . It arises be- 
cause pileup consists of low-p t hadrons, and their masses 
are not negligible relative to their p t (cf. also [H, 
It is important mainly for observables sensitive to differ- 
ences between energy and 3-momentum, e.g. jet masses, 
as we will see below. 

The second and main new ingredient is a determina- 
tion, for a specific jet, of the shape's sensitivity to pileup. 
Let the shape be defined by some function V({pj}j e t) of 
the momenta Pi in the jet. Among these momenta, we 
include a set of "ghosts" [2l|, very low momentum par- 
ticles that cover the y — <fr plane at high density, each 
of them mimicking a pileup-like component in a region 
of area A g . We then consider the derivatives of the jet 
shape with respect to the transverse momentum scale, 
Pt, g , of the ghosts and with respect to a component 
m s , g = Vm 2 g +pl g -p t , g , 

V^ m) ^A^d; tg d^ g V({ Pl U). (2) 

The derivatives are to be evaluated at pt, g = ms. g = 0, 
and by scaling all ghost momenta simultaneously. 

Given the level of pileup, p, p m , and the information 
on the derivatives, one can then extrapolate the value of 
the jet's shape to zero pileup, 

Vjct,sub - Vjet - pVjet - Pm Vj et 

+ \p 2 V^ ) + \ P l l V^ + p Pm V^ + .... (3) 

where the formula takes into account the fact that the 
derivatives are evaluated for the jet including the pileup. 

Handling derivatives with respect to both p tt g and mg -g 
can be cumbersome in practice. An alternative is to 
introduce a new variable r t . g and set pt. g — r t . g and 
mj )9 = ^-Tt,g- We then take total derivatives with re- 
spect to r t g 

vS - An 9 ^(feW , (4) 

ar t,g 

so that the correction can be rewritten 

Vj et ,sub = Vj ot " P V$ + \p 2 V® +■■■. (5) 

The derivatives V^ m ' n) or V$ can be determined 
numerically, for a specific jet, by rescaling the ghost 



momenta and reevaluating the jet shape for multiple 
rescaled values. Typically this is more stable with Eq. ^ 
and this is the approach we use below. 

To investigate the performance of our correction pro- 
cedure, we consider a number of jet shapes: 

• Angularities [H, |34j], adapted to hadron-collidcr 
jets as = EiPti&RliJEiPtU for /8 = 
0.5,1,2,3; 6>W, the "girth", "width" or "broaden- 
ing" of the jet, has been found to be particularly 
useful for quark/gluon discrimination (l7l. f35j| . 

• Energy-energy-correlation (EEC) moments, advo- 
cated for their resummation simplicity in 36], 

£ ' (/3) = Ysi^PtiPtjARlj/iYsiPti) 2 , using the same 
set of (3 values. EEC-related variables have been 
studied recently also in |37| . 

• "Subjettiness" ratios, designed for char- 
acterising multi-pronged jets [l3T - fl5l |: 
one defines the subjettiness T ^ xcs >^) _ 
X)iP«min(Ai2ii, . . . , AR iN )P / J2iPa, where 
A.Ri a is the distance between particle i and axis a, 
where a runs from 1 to N. One typically considers 
ratios such as t%i = t^/ti and T32 = T3/T2 (the 
latter used e.g. in a recent search for R-parity 
violating gluino decays [38] ) ; we consider (3 = 1 and 
j3 = 2, as well as two choices for determining, the 
axes: "kt" , which exploits the k± algorithm (33, H(| 
to decluster the jet to N subjets and then uses 
their axes; and "lkt", which adjusts the "kt" 
axes so as to obtain a single-pass approximate 
minimisation of tn [15| . 

• A longitudinally invariant version of the planar 
flow [ll|, [l2j], involving a 2 x 2 matrix M a p = 
J2iPti( a i - OjetXA - /3jet)> where a and j3 corre- 
spond either to the rapidity y or azimuth (f); the 
planar flow is then given by Pf = 4AiA2/(Ai + A2) 2 , 
where A1.2 are the two eigenvalues of the matrix. 

One should be aware that observables constructed from 
ratios of shapes, such as T„ jn _i and planar flow, are not 
infrared and collinear (IRC) safe for generic jets. In par- 
ticular Pf and r 2 i are IRC safe only when applied to jets 
with a structure of at least two hard prongs, usually guar- 
anteed by requiring the jets to have significant mass; T32 
requires a hard three-pronged structure, 3 a condition not 
imposed in previous work, and that we will apply here 
through a cut on T21. 



Consider a jet consisting instead of just two hard particles with 
pt = 1000 GeV, with = 0, 0.5 and two further soft particles 
with p t = e, at <j> = 0.05,0.1, all particles having y = 0. It 
is straightforward to see that T32 is finite and independent of e 
for e — > 0, which results in an infinite leading-order perturbative 
distribution for T32. 
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FIG. 1: Impact of pileup and subtraction on various jet-shape distributions and their averages, in dijet, WW and ti production 
processes. The distributions are shown for Poisson distributed pileup (with an average of 30 pileup events) and the averages 
are shown as a function of the number of pileup events, npu- The shapes are calculated for jets with pt > 500 GeV (the cut is 
applied before adding pileup, as are the cuts on the jet mass mj and subjettiness ratio T21 where relevant). 



For the angularities and EEC moments we have verified 
that the first two numerically-obtained derivatives agree 
with analytical calculations in the case of a jet consisting 
of a single hard particle. For variables like tn that involve 
a partition of a jet, one subtlety is that the partitioning 
can change as the ghost momenta are varied to evaluate 
the numerical derivative. The resulting discontinuities 
(or non-smoothness) in the observable's value would then 
result in nonsensical estimates of the derivatives. We 
find no such issue in our numerical method to evaluate 
the derivatives, but were it to arise, one could choose to 
force a fixed partitioning. 

To test the method in simulated events with pileup, 
we use Pythia 8.165, tune 4C [U [42|. We consider 
3 hard event samples: dijet, WW and ti production, 
with hadronic W decays, all with underlying event (UE) 
turned off (were it turned on, the subtraction proce- 
dure would remove it too). We use anti-fc t jets [43| 
with R — 0.7, taking only those with p t > 500 GeV 
(before addition of pileup). All jet- finding is performed 
with Fast Jet 3.0 [44|. The determination of p and p m 



for each event follows the area-median approach [23j: 
the event is broken into patches and in each patch one 
evaluates p tiP atch = X)ie pa ,tchPM> as wel1 as TO <5,patch = 
X^epatch (Vmf +Pt i —p u ) , where the sum runs over par- 
ticles i in the patch. Then p and p m are given by 

( Pt, patch 1 f mg patch 1 

p = median < — — > , p m — median < — — > , 

patches (_ j4 pa tch J patches ^ ^4 pa tch J 

(6) 

where A is the area of each patch. To obtain the patches 
we cluster the event with the kt algorithm with R 
0.4 For non-zero p m the formula for correcting a jet's 
4-momentum is 

Pfet, S ub =Pfet-[Mjet> MjU. (P + P™)A? et , { P + p m )Afj , 

with the area 4- vector, A^, as defined in 21]. 

We have 17 observables and 3 event samples. Fig. Q] 
gives a representative subset of the resulting 51 distribu- 
tions, showing in each case the distribution (and average) 
for the shape without pileup (solid green line), the result 
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FIG. 2: Left: rate for tagging quark and gluon jets using a fixed cut on the jet width, shown as a function of the number of 
pileup vertices. Middle: filtered jet-mass distribution for fat jets in tt events, showing the impact of the p and p m components 
of the subtraction. Right: tagging rate of an iV-subjettiness top tagger for tt signal and dijet background as a function of the 
number of pileup vertices. All cuts are applied after addition (and possible subtraction) of pileup. Subtraction acts on n, T2 
and T3 individually. See text for further details. 



with pileup (dashed line) and the impact of subtracting 
first and second derivatives (dotted and solid black lines 
respectively). The plots for the distributions have been 
generated using a Poisson distribution of pileup events 
with an average of 30 events (our count includes diffrac- 
tive and elastic events, and the analysis uses all particles 
from the event generator, leading to p ~ 770 MeV and 
p m ~ 125 MeV per pileup event at central rapidities). 

For nearly all the jet shapes, the pileup has a substan- 
tial impact, shifting the average values by up to 50— 100% 
(as compared to a 5 — 10% effect on the jet pt). The sub- 
traction performs adequately: the averaged subtracted 
results for the shapes usually return very close to their 
original values, with the second derivative playing a small 
but sometimes relevant role. For the distributions, tails 
of the distributions are generally well recovered; however 
intrajct pileup fluctuations cause sharp peaks to be some- 
what broadened. These cannot be corrected for without 
applying some form of noise reduction, which would how- 
ever also tend to introduce a bias. Of the 51 combinations 
of observable and process that we examined, most were 
of similar quality to those illustrated in Fig. [TJ with the 
broadening of narrow peaks found to be more extreme 
for larger j3 values. The one case where the subtraction 
procedure failed was the planar flow for (hadronic) WW 
events: here the impact of pileup is dramatic, transform- 
ing a peak near the lower boundary of the shape's range, 
Pf = 0, into a peak near its upper boundary, Pf = 1 
(bottom- right plot of Fig. Q}. This is an example where 
one cannot view the pileup as simply "perturbing" the jet 
shape, in part because of intrinsic large non-linearities in 
the shape's behaviour; with our particular set of pt cuts 
and jet definition, the use of the small-/? expansion of 
Eq. ([5]) fails to adequately correct the planar flow for 
more than about 15 pileup events. 

Next, we consider the use of the subtraction approach 
in the context of quark/gluon discrimination. In a study 



of a large number of shapes, Ref. [I?} found the jet 
girth or broadening, to be the most effective sin- 

gle infrared and collinear safe quark/gluon discrimina- 
tor. Fig. [5] (left) shows the fraction of quark and gluon- 
induced jets that pass a fixed cut on 9^ < 0.05 as a 
function of the level of pileup — pileup radically changes 
the impact of the cut, while after subtraction the q/g 
discrimination returns to its original behaviour. 

Our last test involves top tagging, which we illustrate 
on R = 1, anti-fct jets using cuts on the "filtered" jet 
mass and on the T32 subjettiness ratio. The filtering se- 
lects the 4 hardest Ran = 0.25, Cambridge/ Aachen [45[ 
subjets after pileup subtraction. The distribution of fil- 
tered jet mass is shown in Fig. [2] (middle), illustrating 
that the subtraction mostly recovers the original distri- 
bution and that p m is as important as p (specific treat- 
ments of hadron masses, e.g. setting them to zero, may 
limit the impact of p m in an experimental context). The 
tagger itself consists of cuts on tv> < 0.6, r%\ > 0.15 and 
a requirement that the filtered [6j jet mass be between 
150 and 200 GeV. The rightmost plot of Fig.Hshows the 
final tagging efficiencies for hadronic top quarks and for 
generic dijets as a function of the number of pileup events. 
Pileup has a huge impact on the tagging, but most of the 
original performance is restored after subtraction. 

To conclude, this letter has shown how most jet shapes 
can be straightforwardly corrected for the effects of 
pileup. The corrections allow shape-based jet substruc- 
ture analyses to continue to perform well even in the pres- 
ence of up to 60 pileup events, notably when combined 
with the corrections introduced here for hadron masses 
in pileup. This progress will help ensure the viability of 
a broad range of jet substructure tools, shape-based and 
subjet-based, in high-luminosity LHC running. 

The software for the general shape subtraction ap- 
proach presented here will be made available as part of 
the Fast Jet Contrib project [461 ] . 
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